New Results with the Lincoln Tied-Mixture HMM CSR System
نویسنده
چکیده
The following describes recent work on the Lincoln CSR system. Some new variations in semiphone modeling have been tested. A very simple improved duration model has reduced the error rate by about 10~ in both triphone and semiphone systems. A new training strategy has been tested which, by itself, did not provide useful improvements but suggests that improvements can be obtained by a related rapid adaptation technique. Finally, the recognizer has been modified to use bigram back-off language models. The system was then transferred from the RM task to the ATIS CSR task and a limited number of development tests performed. Evaluation test results are presented for both the RM and ATIS CSR tasks. I N T R O D U C T I O N The following experiments are all carried out in the context of the Lincoln tied-mixture (TM) hidden Markov model (HMM) continuous speech recognition (CSR) system. This system uses two observation streams (TM-2) for speaker-dependent (SD) recognition: mel-cepstra and time differential mel-cepstra. For speaker-independent (SI) recognition, a second differential mel-cepstral observation stream is added (TM-3). The system uses Gaussian tied mixture [1, 2] observation pdfs and treats each observation stream as if it is statistically independent of all others. Triphone models [14], including cross-word triphone models [10, 7, 16], are used to model phonetic coarticulation. These models are smoothed with reduced context phone models [14]. Each phone model is a three state "linear" (no skip transitions) HMM. The phone models are trained by the forward-backward algorithm using an unsupervised monophone (context independent phone) bootstrapping procedure. The recognizer extrapolates (estimates) untrained phone models and recognizes using a Viterbi beam search. The initial implementation uses finite-state grammars, contains an adaptive background model, and allows optional inter-word silences. All RM1 development tests use the designated SD development test set (100 sentences x 12 speak1This work was sponsored by the Defense Advanced Research Projects Agency. ers) and all RM2 tests use the designated development test set (120 sentences x 4 speakers).
منابع مشابه
The Lincoln Large-Vocabulary HMM CSR
The work described here focuses on recognition of the Wall Street Journal (WSJ) pilot database [17], a new CSR database which supports 5K, 20K, and up to 64Kword CSR tasks. The original Lincoln Tied-Mixture HMM CSR was implemented using a time-synchronous beam-pruned search of a static network[14] and does not extend well to this task because the recognition network would be too large for curre...
متن کاملTied Mixtures in the Lincoln Robust CSR
HMM recognizers using either a single Gaussian or a Gaussian mixture per state have been shown to work fairly well for 1000-word vocabulary continuous speech recognition. However, the large number of Gaussians required to cover the entire English language makes these systems unwieldy for large vocabulary tasks. Tied mixtures offer a more compact way of representing the observation pdf's. We hav...
متن کاملThe Lincoln Continuous Tied-Mixture HMM Speech Recognizer
The Lincoln robust HMM recognizer has been converted from a single Ganssian or Gaussian mixture pdf per state to tied mixtures in which a single set of Gaussians is shared between all states. There were some initial difficulties caused by the use of mixture pruning [12] but these were cured by using observation pruning. Fixed weight smoothing of the mixture weights allowed the use of word-bound...
متن کاملA Tied-Mixture 2-D HMM Face Recognition System
In this paper, a simplified 2-D second-order Hidden Markov Model (HMM) with tied state mixtures is applied to the face recognition problem. The mixture of the model states is fully-tied across all models for lower complexity. Tying HMM parameters is a well-known solution for the problem of insufficient training data leading to nonrobust estimation. We show that parameter tying in HMM also enhan...
متن کاملTied-Posteriors: A New Hybrid Speech Recognition Technology with Generic Capabilities and High Portability
This paper presents a new method for estimating the emission probabilities of general hybrid connectionist/HMM recognition systems. Contrary to the traditional hybrid approach, where a neural network is used for providing posterior probabilities in order to model the emission probabilities of one-state HMMs, our new tiedposterior approach uses the posterior probabilities resulting from the neur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1991